{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GPT: Generative Pre-Training\n",
    "- GPT的模型核心组件选择了Transformer，区别于之前的其他一些模型包括ELMo的LSTM。这确实给模型带来了极大的性能和速度提升。\n",
    "- 为了方便将语言模型的能力transfer到下游的各种任务上去，GPT对模型的输入进行了规范，称为 traversal-style input transformations。\n",
    "- GPT对词典使用了 bytepair encoding (BPE) subword units来作为基本单元，即不能将句子长度被增加太多而降低模型性能，也能有效减少词典的大小以减少模型参数量\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- GPT模型使用经典的two stage training。第一个阶段，将一个大容量的语言模型在很大的无监督语料上进行预训练。第二个阶段，在特定任务的监督数据上进行finetune。\n",
    "- GPT使用了标准的语言模型目标，优化某个词在其前面k个词出现情况下的条件概率。\n",
    "- GPT在特定任务的训练时，会把语言模型目标的误差一起加在总误差中联合训练，以提升模型的泛化能力，缓解灾难性遗忘\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `GPT` 使用 `transformer` 的解码模块；`BERT` 则使用其编码模块。\n",
    "- `GPT` 与传统的语言模型相同，每次输出一个 `token`；然后该 `token` 被加入到输入序列中，新序列作为下一步的输入：`auto-regression`\n",
    "![](images/gpt-2-autoregression-2.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `GPT` 使用的 `Masked Self-Attention`，只允许一个位置 `attend` 当前位置及其左边的所有 `token`；`BERT` 则允许左右两边的。\n",
    "\n",
    "\n",
    "   \n",
    "1. 给模型输入起始标记 `<s>`，模型输出一个全词汇表的概率分布，概率最高的则为，第一个输出单词；\n",
    "    - 总是选择概率最高的单词，模型可能会陷入循环，总是输出相同的单词；`top-k` 参数，在概率最高的前 `k` 个单词中采样；\n",
    "2. 将第一步的输出添加到输入向量中，进行下一步预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "- 首先在一个嵌入矩阵中查找到 `<s>` 对应的向量；再加上位置向量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-04-08T09:30:54.333305Z",
     "start_time": "2020-04-08T09:30:54.327574Z"
    }
   },
   "source": [
    "**参考链接**\n",
    "1. http://jalammar.github.io/illustrated-gpt2/\n",
    "2. https://amaarora.github.io/2020/02/18/annotatedGPT2.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GPT 实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-10T02:58:10.219900Z",
     "start_time": "2020-06-10T02:58:10.214168Z"
    }
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers\n",
    "from tensorflow.keras.layers.experimental.preprocessing import TextVectorization\n",
    "import numpy as np\n",
    "import os\n",
    "import re\n",
    "import string\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 自注意力"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-10T03:02:27.660033Z",
     "start_time": "2020-06-10T03:02:27.654753Z"
    }
   },
   "outputs": [],
   "source": [
    "class MultiHeadSelfAttention(layers.Layer):\n",
    "    def __init__(self, embed_dim, num_heads=8):\n",
    "        super(MultiHeadSelfAttention, self).__init__()\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
